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^ Abstract 

I We investigate ways in which an algorithm can improve its expected performance by fine- 

' tuning itself automatically with respect to an arbitrary, unknown input distribution. We 

^ . give such self-improving algorithms for sorting and computing Delaunay triangulations. The 

I highlights of this work: (i) an algorithm to sort a list of numbers with optimal expected 

' limiting complexity; and (ii) an algorithm to compute the Delaunay triangulation of a set of 

I— !■ points with optimal expected limiting complexity. In both cases, the algorithm begins with a 

i training phase during which it adjusts itself to the input distribution, followed by a stationary 

' regime in which the algorithm settles to its optimized incarnation. 

c/3 ! 

_^ 1 Introduction 

> : 

. The classical approach to analyzing algorithms draws a familiar litany of complaints: worst-case 
' bounds are too pessimistic in practice, say the critics, while average-case complexity too often rests 
O on unrealistic assumptions. The charges are not without merit. Hard as it is to argue that the only 
■ permutations we ever want to sort are random, it is a different level of implausibility altogether 
^ to pretend that the sites of a Voronoi diagram should always follow a Poisson process or that 
ray tracing in a BSP tree should be spawned by a Gaussian. Efforts have been made to analyze 
j>! algorithms under more complex models (eg, Gaussian mixtures, Markov model outputs) but with 
hmited success and lingering doubts about the choice of priors. 

Ideally, one would like to compute a function / with the help of a self-improving algorithm. Upon 
receiving its first input instance Iq, such an algorithm would compute /(/o) with, say, good worst- 
case guarantees and nothing more. Think of newly installed software that knows nothing about the 
user and runs in its "vanilla" configuration. Subsequently, as it is called upon to compute f{Ik) for 
= 1, 2, . . ., the algorithm would gradually improve its performance through automatic finetuning. 
Intuitively, if the I^s are drawn from a low-entropy distribution, the algorithm should be able to 
spot that and learn to be more efficient. 

The obvious analogy is data compression, which seeks to exploit low entropy to minimize en- 
coding size. The analog of Shannon's noiseless coding theorem would be here: Given an unknown 
distribution T>, design a self- improving algorithm that converges to one with optimal expected run- 
ning time. The second goal, which is to optimize the convergence speed, is more strictly speaking 
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of a machine learning nature. One of the surprises of this work is how minimal distribution learning 
suffices for dramatic self-improvement. 

The starting point of this research is the observation that, trimmed of noise, real-world data is 
often of much lower entropy than size alone suggests. For example, Takens's embedding theorem as- 
serts that univariate time series obtained from deterministic dynamical systems can be geometrized 
canonically as a (usually) low-dimensional attractor set in finite-dimensional space [31]. Hidden 
Markov models for speech recognition can be remarkably effective with only a few thousand states. 
Anecdotal evidence can also be gleaned from the current trend toward personalization in the design 
of web tools (search engines, recommendation systems, etc). Input data is often lodged in a tiny 
shoe of input space that cannot be captured by closed-form distributions. To make predictions 
about the shce is the essence of machine learning [18,23,27]. To take computational advantage of 
the slice is what self-improving algorithms are all about. 

Our Results The performance of a self-improving algorithm is measured with respect to an 
unknown memoryless random source V of input instances. The algorithm is given instances Jq, /i, . . . 
drawn independently from V, which it must solve one at a time in batch mode with: (1) no prior 
knowledge of future instances, that is, f{Ik) must be computed before any of the I/s {j > k) are 
known; and (2) no prior knowledge of V. The algorithm may store auxiliary information to help 
improve its performance. (Unlike self-organizing data structures, however, none of that information 
should be necessary for the algorithm to complete its task.) We use V as shorthand for D", the 
n-th member of an infinite ensemble of distributions — one for each input size. After a training 
phase, we expect the algorithm to settle into its steady state whose expected running time is called 
its limiting complexity. Note that from the user's perspective the only difference noticeable in the 
training phase is that the system might be a little slower. 

Our first result is, in some sense, the first truly optimal sorter. Given a source T> of real-number 
sequences / = {xi, . . . , Xn), let n{I) denote the permutation induced by the ranks of the Xj's, using 
the indices i to break ties. The complexity of our algorithm depends on the entropy H{'k{I)). Note 
this quantity can be much smaller than the entropy of the source itself but can never exceed it. 

• Sorting: We give a self-improving algorithm with a limiting complexity of 0{n + H((k[I))) 
and prove that it is optimal. If the input / = (xi, . . . , x„) to be sorted is obtained by drawing 
each Xi independently (from a distribution that might depend on i), then for any £ > the 
storage can be made 0{n^^^) for an expected running time of 0(n + e^^H{ 

pi{I))): this tradeoff is optimal for distributions of high enough entropy. The training takes 
O(n'^logn) rounds. We also show that independence, or at least some restriction on the 
distribution of /, is necessary: there are input distributions for which the storage must be 
exponential in n. 

We take the concept of self-improving algorithms to the geometric realm and address the classical 
problem of computing the Delaunay triangulation of a set of points in the Euclidean plane. 

• Delaunay Triangulations: Assuming a distribution of n points, each one drawn indepen- 
dently from its own unknown (arbitrary) random source, we give a self-improving algorithm 
of optimal limiting complexity 0{n + H(T{I))) for computing the Delaunay triangulation 
T(/) of input set /. We get time-space tradeoffs as well as lower bounds similar to those for 
sorting. 
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The optimality of these results follows from Shannon's noiseless coding theorem, which states 
that any binary encoding of an information source such as 7r(/) must have an expected code 
length of least H{'k{I)) (and similarly for T(/) and H{T{I)). Any comparison-based algorithm 
implies a coding scheme: the encoder sends the sequence of comparison outcomes, and the decoder 
simulates the algorithm, using the transmitted sequence to determine comparison outcomes. Thus 
any comparison-based algorithm must do an expected 0{H{t:{I))) comparisons, in addition to to 
the Q{n) work needed to report the output. 

Previous Work Related concepts have been studied before. List accessing algorithms and splay 
trees are textbook examples of how simple updating rules can speed up searching with respect to 
an adversarial request sequence [2,9,22,29,30]. It is interesting to note that self-organizing data 
structures were investigated over stochastic input models first [1,3,8,21,25,28]. It was the obser- 
vation [7] that memoryless sources for list accessing are not terribly realistic that partly motivated 
work on the adversarial models. It is highly plausible that both approaches are superseded by more 
sophisticated stochastic models: for example, hidden Markov models for gene finding or speech 
recognition or time-coherent models for self-customized BSP trees [5] . 

Algorithmic self-improvement differs from past work on self-organizing data structures and 
online computation in two fundamental ways: (i) self-improving algorithms operate offline and do 
not lend themselves to competitive analysis; (ii) they do not exploit structure within any given 
input but, rather, within the ensemble of input distributions. For example, suppose that the 
distribution T> consists of two random but fixed permutations, each one equally likely. Any solution 
in the adaptive, self-organizing/adjusting framework requires fi(nlogn) time. It is trivial, however, 
to design a self-improving algorithm of linear limiting complexity: sort the two permutations and 
store them; given any input instance, apply both permutations separately and output the permuted 
instance that is sorted. 

Extensions of our memoryless model are easy to imagine. For example, a Bayesian version 
of self-improvement would postulate a prior and treat the /^'s as data conditioning a posterior 
distribution. One could also consider time-varying distributions or Markov models. Of course, a 
purely adversarial model might easily defeat self- improvement: it would observe how the improve- 
ment proceeds and render it ineffective by tailoring distributions changing over time. Memoryless 
sources are obviously the place to start any investigation on self- improvement. We also beheve that 
the assumption is far less restrictive than for online computation. Take speech for example. The 
weakness of a memoryless model is that the next utterance is highly correlated with the previous 
ones: hence the use of Markov models. A self-improving algorithm would operate at the level of a 
sentence or a paragraph — not an utterance — where correlations are more diffuse and a memoryless 
source might be a good first approximation. 

2 A Self-Improving Sorter 

The self-improving sorter takes an input / = (xi, 2:2, . . . , a;„) of numbers drawn from a distribution 
1^ = Yli^i (i^> sach Xi is chosen independently from "Dj). Let 7r(/) denote the permutation induced 
by the ranks of the Xj's, using the indices i to break ties. By an information theoretic argument, 
it is easy to see that any sorter must take expected Q{H{'k{I)) + n) comparisons. This is, indeed, 
the bound that our self-improving sorter achieves. 
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For simplicity, we begin with the steady-state algorithm and discuss the training phase later. We 
also assume that the distribution V is known ahead of time and that we are allowed some amount 
of preprocessing before having to deal with the first input instance ( §2.11) . Both assumptions are 
unrealistic, so we show how to remove them to produce a bona fide self- improving sorter ( §2.21) . The 
surprise is how strikingly little of the distribution needs to be learned for effective self- improvement. 

Theorem 2.1. There exists a self -improving sorter of 0{n + H{tt{I))) limiting complexity, for 
any input distribution T> = Yli'Di. Its worst case running time is 0{nlogn) . If the input I = 
{xi, . . . ,Xn) to be sorted is obtained by drawing each Xi independently (from a distribution that 
might depend on i), then for any e > the storage can be made 0{n^~^'^) for an expected running 
time of 0{n + e~^H{7r{I))) : this tradeoff is optimal for distributions of high enough entropy. The 
algorithm reaches its steady state within 0{n'^ logn) rounds. 

Remark: Much research has been done on adaptive sorting [19], especially on algorithms that 
exploit near-sortedness. Our approach is conceptually different. As we mentioned in the previous 
section, we seek to exploit properties, not of individual inputs, but of their distribution. In par- 
ticular, our sorter runs in linear time for permutations drawn from a linear-entropy source, even 
though any individual input might be a perfectly random permutation. We are not aware of any 
previous algorithm that can achieve that. 

Can we hope for a result similar to Theorem 12.11 if we drop the independence assumption? The 
short answer is no. 

Lemma 2.2. There exists an input distribution V such that any comparison-based algorithm that 
can sort a random input from T> in expected 0{n-\- H{Tr{I))) time requires at least fl 
storage. This holds for any value of the entropy H{'k{I)) that is smaller than n\ogn by a large 
enough constant factor. 

Proof. Consider the set of all n\ permutations. Every subset 11 of 2^ permutations induces a 
distribution T)^ defined by picking every permutation in 11 with equal probability and none other. 
Note that the total number of distributions is (gh) > (n!/2'^)^ and H{T)^) = h, where is the 
distribution on the output 7r(/) induced by 11. Suppose there exists a comparison-based algorithm 
that sorts a random input from T>^ in expected time at most c{n-\-h), for some constant c > 0. 
By Markov's inequality this implies that at least half of the permutations of 11 are sorted by 
in at most 2c(n -|- h) comparisons. But, within 2c{n + h) comparisons, the algorithm can sort 
a set P of at most 2^'^^"+''^ permutations. Therefore, any other 11' such that Aw = An will have to 
draw at least half of its elements from P. This limits the number of such 11' to 

(272) C^tJ) < 

This means that the number of distinct algorithms needed exceeds 

(n!/2'')2V((^0^"~'2''^"+''^^") > (n!)2''"'2"('^+^)("+'')^'' = 2^(2''"i°g"), 

assuming that h/{nlogn) is small enough. An algorithm is entirely specified by a string of bits; 
therefore at least one such algorithm must require storage logarithmic in the previous bound. □ 

Fredman [20] gives a comparison-based algorithm that can optimally sort any distribution of 
permutations, but uses an exponentially large data structure to decide which comparisons to per- 
form. This result shows that the storage used by Fredman's algorithm is essentially optimal. 
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2.1 Sorting with Full Knowledge 



We consider the problem of sorting / = (xi, . . . , where each Xi is drawn from a distribution Vi, 
which is specified by a vector {pi,i, . . . ,Pi,N), where pij = Pr [xj = j ]0. We can assume without loss 
of generality that all the Xj's are distinct. (If not, simply replace Xi by nxi + i — 1 for tie-breaking 
purposes and enlarge to n{N + 1). All probabilities and entropies remain the same.) 

The first step of the self-improving sorter is to sample T> a few times (the training phase) and 
create a "typical" instance to divide the real line into a set of disjoint, sorted intervals. Next, given 
some input J, the algorithm sorts / by using the typical instance, placing each input number in 
its respective interval. All numbers falling into the same intervals are then sorted in a standard 
fashion. The algorithm needs a few supporting data structures. 

• The I^-LIST: Fix an integer parameter A = clogn, for large enough c, and sample A input 
instances from Yl T^i- Form their union and sort the resulting An-element multiset into a single 
list Ml < ■ • • < u\n. Next, extract from it every A-th item and form the list V = {vq, . . . , fn+i), 
where fo = 0, Vn+i = oo, and Vi = Ui\ for < i < n. Keep the V-list in a sorted table as 
a snapshot of a "typical" input instance. We will prove the remarkable fact that, with high 
probability, locating each in the V^-list is linearly equivalent to sorting /. We cannot afford 
to search the ^-list directly, however. To do that, we need auxiliary search structures. 

• The Dj-TREES: For any i > 0, let B]^ be the predecessoiH of a random y from T>i in the 
^-list, and let Hj be the entropy of ■ The Dj-tree is an optimum binary search tree [2^ 
over the keys of the l^-list, where the access probability of Vk is ^j{pi,j \ vk < j < f/t+i } □, 
for any < A; < n: the same distribution used to define ■ This allows us to compute Bf 
usmg 0{HY + 1) expected comparisons. 

The total space used is 0{n'^). This can be decreased to 0{n^^^) for any e > 0; we describe 
how later. As we explained earlier, the input / is sorted by a two-phase procedure. First we locate 
each Xi in the ^-list using the Dj-trees. This allows us to partition / into groups Gi < G2 < ■ ■ ■ 
of Xj's sharing the same predecessor in the ^-list. The next phase involves going through each Gj 
and sorting their elements naively, say using insertion sort. The first phase of the algorithm takes 
0{n + Hf) expected time0 What about the second? Its complexity is 0(n), as follows from: 

Lemma 2.3. With probability > 1 — n^"^ over the construction of the V-list 

Ed[| {i I < Xi < Vk+i } 1^] = 0{l),for allO < k <n. 

Proof. Remember that the y-list was formed by taking certain elements from a list ui < ■ ■ ■ < u\n, 
where A = clogn. Consider two points Ui and uj. Note that all the other An — 2 points are 
independent of these two points. For every £ ^ {hj}> l^t y/*^ be the indicator random variable 

^AU the arguments we give shall hold directly (and obviously) even if the Pi's are continuous. We have made 
this assumption for ease of presentation. 

^Throughout this paper, the predecessor of ?/ in a list refers to the index of the largest list element < y; it does 
not refer to the element itself. 

•^If the Vi's were continuous, then this would be defined as the probability of Xj falling in [vk, Vk+i). 

^The HY's themselves are random variables depending on the choice of the T^-list. Therefore, this is a conditional 
expectation. 
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for the event that G [ui,Uj) = t. Let = T^tYi- 

omce all the y/*^'s are independent, by 

Chernoff's bound [4], for any [3 G (0, 1], 

Pr[r« < (1 - /3)E[rW]] < e-'^'^t^'*^]/^ 

With probability at least 1 — ra^^, if E[y*^*)] > 4c log n, then y*^*) > 2c log n. We can apply the same 
argument for any pair Ui,Uj. Taking a union bound over all pairs, we get that with probability 
> 1 — if for the pair t, E[F(*)] > 4c log n, then F^*) > 2c log n. 

The V^-list is constructed such that for tk = [vk,Vk+i), y^**) < clogn. Let xj^^''^ be the indicator 
random variable for the event that Xi Gi? Vi lies in tk, and X^*'^) = Xli^^i*'''' = I { ^ I ^fc ^ s;^ < 
Vk+i } |. Note that E[F(*'=)] > (logn - 2)E[X(*'=)] and therefore, E[X(*'=)] = 0(1). Now we apply the 
following standard claim to X^^''^: 

Claim 2.4. Let Z = Zi be a sum of independent positive random variables with Zi = 0(1) for 
all i and E[Z] = 0(1). Then E[Z^] = 0(1). 

Proof. By linearity of expectation, 

= E E [zn + 2 nmiz,] <J2o im]) + (e ^i^^]) = o{i). 

i i<j i \ i / 

□ 
□ 

We have shown that the algorithm takes 0{n + Yli Hj) time (given a fixed \^-list) plus an 0{n) 
additive expected term (over V and V). We now show that this running time is indeed optimal. 

Lemma 2.5. 

Y^HY = 0{n + HW))). 

i 

We will actually show this to be the case for any linear sized sorted list V . We will need a 
basic claim, which shall be proven for completeness, about the joint entropy of independent random 
variables. 

Claim 2.6. Let H{Zi, . . . , Zn) be the joint entropy of independent random variables Zi, . . . , 
Then 

HiZ,,...,Z^) = YH{Zi). 

i 

Proof. This is a consequence of the independence of the Zj's. We will prove this by induction over 
the number of variables the joint entropy includes. For the base case, H{Zi) = H{Zi) by definition. 
Assume inductively that H(Zi, . . . , Z^) = Yl'i=i H{Zi). By the chain rule for conditional entropy [f] 
and independence, 

k+l 

H{Zi, . . . , Zk+i) = H{Zi, . . . , Zk\Zk+i) + H{Zk+i) = ^ H{Zi). 

i=l 

□ 

^Given two random variables X and Y over supports X and y, the conditional entropy H{Y\X) ~ "Ylxex Pr(X = 
x)H{Y\X = x). The chain rule tells us that H{Y, X) = H{Y\X) + H{X) 
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Thus, it suffices to relate the joint entropy of := {BX , . . . , with the entropy of 7r(/). 
This is done via a neat trick: 

Claim 2.7. Let V he a distribution on a universe U, and let X : U ^ X and Y : U —>■ y be two 
random variables. Suppose that the function f : U x X{U) y defined by f : i-^ Y{I) 

can be computed with 0{n) expected comparisons (where the expectation is overV). Then H{Y) = 
0{n + H{X)), where all the entropies are with respect to V. 

Proof. By a classical result from information theory (eg, [16, Theorem 5.4.1]), any unique encoding 
s : X{U) {0,1}* of has expected code length Et)[\s{X{I))\] > H{X), and there exists 

an encoding s* that has expected code length 0{H{X)). Using /, this can be converted into 
an encoding t of Y{IA). Indeed, for every /, Y{I) can be uniquely identified using s*(X(J)) and 
additional bits that represent the outcomes of the comparisons for the computation of /(J,X(/)). 
By taking a shortest such string for each element of Y{U), we obtain a unique encoding t for 
Y{U) with excepted code length Ev[\tiY{I))\] = 0{n + Ev[\s{X{I))\]) = 0{n + H{X)). Since any 
encoding of Y{U) has expected code length at least H{Y), the claim follows. □ 

Lemma 2.8. 

HiB'') = 0in + H{7r{I))). 

Proof. Apply Claim [221 with U = {1, . . . ,n}, X{I) = the permutation induced by input /, 

and Y{I) = B^ . The function / can be computed in linear time by using 7r(/) to sort I and then 
merging this sorted list with V . □ 

Lemma 12.51 now follows from Lemmas 12.61 and 12.81 This completes the proof for the optimality of 
the time taken by the sorter. We now show that the storage can be reduced to ©(n^"*"^), for any 
£ > 0. The main idea is to prune each of the Di trees to depth e\ogn. This ensures that each of 
these trees has size 0(rf) and the total storage used is Oijn}'^^). We also construct a completely 
balanced binary tree T for searching in the V^-list. Now, when we wish to search for Xi in the V^-list, 
we first search using the pruned Dj-tree. At the end, if we reach a leaf of the unpruned Di-tree, we 
stop since we have found the right interval of the V^-list which contains Xj. On the other hand, if 
the search in the Di-tree was unsuccessful, then we use T for searching. 

In the first case, the time taken for searching is simply the same that it would have taken with 
unpruned Dj-trees. In the second case, the time taken is 0((1 + e) logn). But note that the time 
taken with unpruned Dj-trees is > e log n (since the search on the pruned Di-tree failed, we must 
have reached some internal node of the unpruned tree). Therefore, the extra time taken is only a 
0{e~^) factor of the original time. As a result, the space can be reduced to 0{n^^^) with only a 
constant factor increase in running time (for any fixed e > 0). 

We can show that the storage cannot be reduced to linear. In fact, the tradeoff between the 
0(n^+^) storage bound and an expected running time off the optimal by a factor of 0(1/ e) is 
optimal. 

Lemma 2.9. For any c large enough and any h < |nlogn, there is a distribution T> = Yli/^i 
entropy h such that any comparison-based algorithm that can sort a random permutation from T> 
in expected time c{h + n) requires a data structure of bit size 
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Proof. The proof is a specialization of the argument used for proving Lemma [2.2[ Let k = 2^'*/"^. 
We define Vi by choosing k distinct integers in [l,n\ and making them equally likely to be picked 
as Xi. This leads to (")"' > (ti/k)'^"' choices of distinct distributions V. Suppose that there is a data 
structure of size s that can accommodate any such distribution with an expected running time of 
at most c{h + n). Then one such data structure W must be able to accommodate this running time 
for a set Q of at least {n/ k,)'^"'2~^ distributions V. Any input instance that is sorted in at most 
2c{h + n) time by this data structure is called easy: the set of easy instances is denoted by £. 

Each Vi is characterized by a vector Vi = (aj^i, . . . ,ai,K), so that V itself is specified hy v = 
(f 1, . . . , Vn) € M"''. (From now on, we view v both as a vector and a distribution of input instances.) 
Define the j-th projection of v as = (aij, . . . , a„j). Even if v E Q, it could well be that none of 
the projections of v are easy. However, if we consider the projections obtained by permuting the 
coordinates of each vector Vi = (ctj,!, . . . , flj^K) in all possible ways we enumerate each input instance 
from V the same number of times. Note that applying these permutations gives us different vectors 
which also represent T>. Since the expected time to sort an input chosen from "D G ^ is at most 
c{h + n), by Markov's inequality, there exists a choice of permutations (one for each 1 < i < n) 
for which at least half of the projections of the vector obtained by applying these permutations are 
easy. 

Let us count how many distributions have a vector representation with a choice of permutations 
placing half its projections in £. There are fewer than \£\'^^'^ choices of such instances and, for any 
such choice, each v'^ = (oj^i, . . . , aj^^) has half its entries already specified, so the remaining choices 
are fewer than n*^"/^. This gives an upper bound of 77,«^"/2|£;|«;/2 ^^^e number of such distributions. 
This number cannot be smaller than \Q\ > {n/ k)'^"'2~^; therefore 

In a comparison-based decision tree model, each input instance is associated with the leaf of a 
binary decision tree of depth at most 2c{h + n), ie, with one with at most 2^'^^'^"'"") leaves. This 
would give us a lower bound on s if each instance was assigned a distinct leaf. But this may not be 
the case. However, we have a collision bound, saying that at most 4"^ instances can be mapped to 
the same leaf. This implies that |£^|4~" < 2^'^^'^"'""); and by ([1]), s = ^l^Knlogn); hence the lemma. 

To prove the collision bound, we use the tie-breaking rule mentioned earlier: Xi ^— nxi+i — 1. It 
is clear that two instances mapping to two distinct permutations must lead to two different leaves 
of the decision tree. So the only question left is to bound the number of instances mapping to a 
given permutation. Let x = (xi,...,x„) be an input instance (no tie-breaking). Represent the 
ground set of this instance as an ra-bit vector a (a^ = 1 if some Xj = i, else aj = 0). Let x be sorted 
to give the vector y = {yi, . . . , y„). For z = 2, . . . , n, let /5j = 1 if ?/j = else Pi = 0. Given the 
vectors a, (3 and the induced permutation, the input instance x can be recovered. This proves the 
collision bound. □ 

□ 

2.2 Learn & Sort 

The \^-list is built in the first 0(log n) rounds. The Dj-trees will be built after 0{n^ log n) additional 
rounds, which will complete the training phase. During that phase, sorting is handled via, say, 
mergesort to guarantee O(nlogn) complexity. The training part per se consists of learning basic 
information about for each i. For notational simplicity, fix i and let pk = Pr^i [ffc < ?/ < ffc+i ]. 
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Let M — cn^logn, for a large enough constant c. For any k, let Xk be the number of times, over 
the first M rounds, that Vk is found to be the K-list predecessor of some Xj. (Wc use standard 
binary search to compute predecessors in the training phase.) Finally, define the Dj-tree to be a 
weighted binary search tree defined over all the v^s such that Xk > Mn~^ . Recall that the defining 
property of such a tree is that the node associated with a key of weight Xk is at depth 0{\ogx/Xk), 
where X — '^Xk- We apply this procedure for each i — 1, . . . ,n. 

This Di-tree is essentially the pruned version of the one mentioned earlier. Like before, its 
size is 0{M / {Mn"'^)) = 0{n^). The way we use it is similar to what we described, with a few 
minor differences. For completeness, we go over it again: given Xj, we perform a binary search 
down the Dj-tree, stopping as soon as we encounter a node whose associated key is such that 
Xi e [vk-iVk+i)-i in which case we have the predecessor of Xi in the F-list and we are done. If we 
reach the bottom of the Dj-tree without success, we simply perform a standard binary search in 
the V-\\si. 

Lemma 2.10. Fix i. With probability at least 1 — l/n^, for any k, Pk > implies that Mpk/2 < 
Xk < 3Mpfe/2. 

Proof. The expected value of Xk is Mp^. If Pk — ^{n~^) then, by Chernoff's bound [4] (pages 
267-268), the count Xk deviates from its expectation by more than a — Mpk/2 with probability 

for some constant b growing linearly with c. A union bound (over all k) completes the proof. □ 

Suppose the condition of the lemma holds for each k (and fixed i). We show now that the 
expected search time is 0{e~^ + 1). Consider each element in the sum — YlikPk^^ZPk^ ■ 

• Pfe > n"^: if Vk is in the iPj-tree, then the cost of the search is 0(logx/XA:), so its contribution 

to the expected running time is 0{pk \ogx/Xk)- By the lemma, this is also 0(pfc(l + logp^^)), 
as desired. If Vk is not in the Dj-trcc. then the search is unsuccessful and costs O(logn) time: 
its contribution to the expected running time is 0{pk\ogn). Not being in the tree, however, 
means that Xk ^ Mn^^; hence pk < 2n^^ and the contribution is 0(e~^pfc logp^^). 

• Pk < the search time is always O(logn) time; hence the contribution to the expected 
running time is 0{e~^Pk\ogp'^^) . 

By summing up over all k, we find that the expected search time is 0{e'~^ Hj This assumes 
the conditions of the lemma. But these are satisfied for all i with probability at least 1 — 1/n. This 
leaves a probability 1/n that the training fails and we are stuck with ©(nlogn) sorting — note that 
we do not try to detect failure. But this adds only an additive sublinear term to the expected 
complexity and is therefore negligible. 

3 Delaunay Triangulations 

Let / = (xi, . . . , ,t;„) denote an input instance, where each Xi is a point in the plane, generated by 
a point distribution Dj. The distributions are arbitrary, and may be continuous, although we 
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never explicitly use such a condition. Each Xi is independent of the others, and so the input / is 
drawn from the product distribution V = Yli^i- each round, a new input / is drawn from V, 
and we wish to compute the Delaunay triangulation of /. We are in the comparison model, so any 
operation consists of evaluating a polynomial at some point (more details about this are given in 
§3.21) . Although it is not critical, for the sake of simplicity, we will assume that the points of / are 
in general position, which is true with probability one when all the Pj's are continuous. Also we 
will assume that there is a bounding triangle such that all points always lie inside this triangle. 

The distribution T> also induces a (discrete) distribution on the set of Delaunay triangulations, 
viewed as labelled graphs on the vertex set [1, n]. Consider the entropy of this distribution: for each 
graph G on [1, n], let pc be the probability that it represents the Delaunay triangulation of I G/j T>. 
Abusing notation, let the output entropy H{T{I)) := —J2gPg^'^&Pg- By information-theoretic 
arguments, this quantity is a lower bound on the expected time required by any comparison-based 
algorithm to compute the Delaunay triangulation of / Er V. An optimal algorithm will be one 
that has an expected running time of 0{n + H{T{I))). Our main result is the following: 

Theorem 3.1. For inputs (xi, X2, ■ ■ ■ , Xn) drawn from the product distribution T> = Yl- T>i, and for 
any constant e > 0, there is a self-improving algorithm for finding the Delaunay triangulations of 
the Xi that has a learning phase of 0{n^) rounds and uses 0{n^^'^) space%. The limiting running 
time is 0{e~^{n + H(T{I)))), and therefore optimal. 

From the linear time reduction of sorting to computing Delaunay triangulations, the lower 
bounds for sorting carry over to Delaunay triangulations. As an immediate corollary of Lemma [2^ 
we get 

Corollary 3.2. There exists an input distribution V such that any self-improving algorithm comput- 
ing the Delaunay triangulation of inputs from T> in 0{n + H{T{I))) limiting running time requires 
f2(2") space. 

Furthermore, by Lemma 12.91 the time-space tradeoff we provide is essentially optimal. 
3.1 The algorithm 

We describe the algorithm in two parts. The first part explains the learning phase and the data 
structures that are constructed ( §3.1.ip . Then, we explain the how these data structures are used 
to speed up the computation in the limiting phase ( §3.1.21) . As before, the expected running time 
will be expressed in terms of certain parameters of the data structures obtained in the learning 
phase. In the next section ( §3.21) . we will prove that these parameters are comparable to the output 
entropy H{T{I)). First, we will assume that the distributions Vi are known to us, and the data 
structures described will use Oijn?) space. Section [3731 repeats the arguments of Section [272] to give 
the space-time tradeoff bounds of Theorem 13.11 

As outlined in Figure [1], our algorithm for Delaunay triangulation is roughly a generalization of 
our algorithm for sorting. This is not surprising, but note that while the steps of the two algorithms, 
and their analyses, are analogous, in several cases a step for sorting is trivial, but the corresponding 
step for Delaunay triangulation uses some relatively recent and sophisticated prior work. 

^The total time required for the learning phase is also 0{v}^^). 
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Delaunay Triangulation 


Intervals {xi,Xi>) containing no values of / 
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log n training instance points with the same 
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Optimal weighted binary trees li 
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bntropy-optimal planar point location data 
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Sorting withm buckets 


iriangulation withm VlZJ fl s (L-laim 13.81) 
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T(\/U/) 


Build sorted V hom sorted K U i (trivial) 


Build 1 [Ij from i (K U i) [12\ 
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(analysis) merge T{V) and T(/) [10] 


(analysis) recover indices from sorted / 
(trivial) 


(analysis) recover triangles B^ in T(\^) from 
T(/) (Lemma iMlj) 



Figure 1: Delaunay triangulation algorithm as a generalization of the sorting algorithm 



3.1.1 Learning Phase 

For each round in the learning phase, we use a standard algorithm to compute the output Delaunay 
triangulation. We also perform some extra computation to build some data structures that will 
allow speedup in the limiting phase. 

The learning phase is as follows. Take the first A := clogn input lists Ii, I2, ■ ■ ■ , I\, where c is 
a sufficiently large constant. Merge them into one list / of An = cnlogn points. Setting e := 1/n, 
find an e-net V O I for the set of all open disks. In other words, find a set V such that for any 
open disk C that contains more than eXn = clogn points of /, C contains at least one point of 
V. Matousek et al. [24] show that there exist e-nets of size 0{l/e) for disks, which here is 0{n). 
Furthermore, a construction and analysis similar to that of Clarkson and Varadarajan [15] yields a 
randomized construction (with polynomially small error probability) that takes ?T,(logr;,)*^'^^^ time. 

We construct the Delaunay triangulation of V , which we denote by T{y). This is the analogue 
of the y-list for the self-improving sorter. We build an optimal planar point location structure 
(called D) for TiV): given a point, we can find in O(logn) time the triangle of T{y) that it lies in. 
Define the random variable Bi to be the triangle of T{y) that Xi falls intc0. Now let the entropy 
of Bi be HY ■ If the probability that Xi falls in triangle t of T{y) is p*, then HY = — Xlt Pi log Pi- 
For each i, we construct a search structure Di of size 0{n) that finds Bi in expected 0{HY) time. 
These -Dj's can be constructed using the results of Arya et al. [6], for which the number of primitive 
comparisons is HY + o{HY)- These correspond to the Di-trees used for sorting. 

We will show that the triangles of T(y ) do not contain many points of a new input I ErT) on 
the average. Consider a triangle t of T{y) and let Ct be its circumscribed disk; this is a Delaunay 
disk of V . If a point Xi E I lies in Cj, we say that Xi is in conflict with t and call t a conflict triangle 
for Xi- (The "confiict" terminology arises from the fact that if Xi were added to V , triangles with 

^Assume that we add the vertices of the bounding triangle to V . This wiU ensure that Xi will always fall in some 
triangle Bi. 
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which it conflicts would no longer be in the Delaunay triangulation.) Let Zt := I (1 Ct, the random 
variable that represents the points of / G_r V that fall inside Ct, the conflict set of t. Furthermore, 
let Xt := \Zt\., Note that the randomness comes from the random distribution of /, and so V and 
T{V), as well as the randomness of I. We are interested in the expectation E[Xt] over / of Xf. All 
expectations are taken over a random input / chosen from V. 

Claim 3.3. With probability at least 1 — over the construction ofT{V), for every triangle t of 
T{V), E[Xt] = 0(1) and E[X^] = 0(1). 

Proof. This is similar to the argument given in Lemma [2.31 with a geometric twist. Let the list of 
points / be si, . . . , s\n, the concatenation of Ji through I\. Consider the triangle t with vertices 
■Si, •§2, S3. Note that all the remaining An — 3 points are chosen independently of these three, from 
some distribution Vj. For each j G [4, An], let vj^'^ be the indicator variable for the event that Sj 

is inside Ct. Let Y^^^ = J2j y}^^ ■ By the Chernoff bound, for any /? e (0, 1], 

Pr [yW < (1 - /?)E[rW]] < e-^'E[y(*)]/2_ 

Setting i3 = 1/2, if E[F(*)] > 48 log n, then y(*) > 24 log n with probability at least 1 - n~^. 
We can now consider any triangle generated by some triple of points Sj, Sj, Sk, for i,j, k G [4, An], 
and apply the same argument as above. Taking a union bound over all triples of the points in /, 
we obtain that with probability at least 1 — n~^, for any triangle t generated by the points of /, if 
> 48 log n, then Y^^^ > 24 log n. We henceforth assume that this event happens. 

Consider a triangle t of T{V) and its circumcircle Ct. Since T{V) is Delaunay, Ct contains no 
point of V in its interior. Since \^ is a (l/n)-net for all disks with respect to /, Ct contains at most 
clogn points of /, that is, Y^*^ < clogn. This implies that E[y^*^] = O(logn), as in the previous 
paragraph. Since E[y*^*^] > (logn, — 3)E[Xt], we obtain E[Xf] = 0(1), as claimed. Furthermore, 
since Xt can be written as a sum of independent indicator random variables, E[Xj^] = 0(1), by 
Claim El □ 

3.1.2 Limiting Phase 

We assume that we are done with learning phase, and have T{V) with the property given in 
Claim [331 for every triangle t G T{V), E[Xt] = 0(1) and E[xf] = 0(1). We have reached the 
limiting phase where the algorithm is expected to compute the Delaunay triangulation with the 
optimal running time. We will prove the following lemma in this section. 

Lemma 3.4. Using the data structures from the learning phase, and the properties of them that 
hold with probability 1 — 0{l/n) , in the limiting phase the Delaunay triangulation of input I can 
be generated in expected 0(n + J2^=i Hf) time. 

The algorithm, and the proof of this lemma, has two steps. In the first step, T{y) is used 
to quickly compute T{y U /), with the time bounds of the lemma. In the second step, T(/) is 
computed from T{y U /), using a randomized splitting algorithm proposed by Chazelle et al. [12], 
whose Theorem 3 is as follows. 

Theorem 3.5. Given a set of n points P and its Delaunay triangulation, for any partition of P 
into two disjoint subsets Pi and P2, the Delaunay triangulations T{Pi) and T{P2) can be computed 
in 0{n) expected time, using a randomized algorithm. 
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Figure 2: Proof of Claim [3^ 



The remainder of the proof of the lemma, and of this subsection, is devoted to showing that 
T{V U /) can be computed in the time bound of the lemma. The algorithm is as follows. For 
each Xi, we use Di to find the triangle ti of T{V) that contains it. By the arguments given in 
the previous section, this takes time 0{J2^=i HY). We now need to argue that given the tj's, the 
Delaunay triangulation T{I) can be computed in expected linear time. For each Xj, we walk through 
T{V) and find all the Delaunay disks of T{V) that contain Xq^ , cLS 111 incremental constructions of 
Delaunay triangulations. This is done by breadth-first search of the dual graph of T{V), starting 
from ti. Let Si denote the set of circumcircles containing Xj. The following standard claim implies 
that this procedure will work. 

Claim 3.6. The set of t E T(y) with Ct E Si is a connected set in the dual graph ofT{V). 

Proof. Consider some triangle t with Ct E Si. We will show that t is connected to tj by a path in 
the dual graph of T{V). Consider the edge e such that Xi is in the sector bounded by Ct and e. Let 
t' be the neighbor of t adjacent to e. Note that since Ct is a Delaunay triangle, t' e Si. If t' is ti, 
we are done. If not, then consider the edge e' such that Xi is in the sector bounded by Ct' and e'. 
Refer to Figure [2J The edge e' is closer to Xi than e. We now consider the neighbor of t' adjacent 
to e' and continue in this manner. Eventually, we must reach ti by a connected path in the dual 
graph of T(r). □ 

Claim 3.7. Given all ti 's, all Si and Zt sets can be found in expected linear time. 

Proof. To find all circles containing Xj, do a breadth-first search from tj. For any triangle t encoun- 
tered, check if Ct contains Xi. If it does not, then we do not look at the neighbours of t. Otherwise, 
add Ct to Si and Xi to Zt and continue. By Claim [3l6| we will visit all Cts that contain Xj. The 
time taken to find Si is 0(|S'j|). The total time taken to find all S'j's (once all the tj's are found) is 
^(Sr=i l*^*!)- Define the indicator function x(t,'i) that takes value 1 if Xj G Ct and zero otherwise. 
We have 

n n n 

Ei^^i = E E ^(M)= E E^(M) = E^- 



i=l 



i=l t&T{V) 



t&T{V) i=l 



Therefore, by Claim 



E 



i=l 



E 



E^* 



Y,nXt] = 0{n). 
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This implies that all S'j's and Zts can be found in expected linear time. 



□ 



Our aim is to build the Delaunay triangulation T{V U /) in linear time using the conflict sets 
Zf. To that end, we will use divide-and-conquer to compute the Voronoi diagram ViV U J), using 
a scheme that has been used for nearest neighbor searching [13] and for randomized convex hull 
constructions [11,14]. It is well known that the Voronoi diagram of a point set is dual to the 
Delaunay triangulation, and that we can go from one to the other in linear time. Consider the 
Voronoi diagram of V, V{V). By duality, the nodes of V{V) correspond to the triangles in T{V), 
and we identify the two. In particular, each node t of V(^) has a conflict set Zt, the conflict set 
for the corresponding triangle in T{V), and \Zt\ = Xt. We triangulate the Voronoi diagram: for 
each region r of V{V), determine the lexicographically smallest Voronoi node tr in r with minimum 
Xt. Add edges from all the Voronoi nodes in r to tr- Since each region of V{V) is convex, this 
yields a triangulation of V{V). We call it the geode triangulation of V{V) with respect to /, ^/(yj^. 
Clearly, Gi{V) can be computed in linear time. We extend the notion of conflict set to the triangles 
in Gj{V): Let s be a triangle in Gj{V) and let ti, t2, t^ be its incident Voronoi nodes. Then the 
conflict set of s, Zs, is defined as Zg := Zt^ ^Zt^ ^Zp^ where v E V is the point whose Voronoi 

region contains the triangle s (See, for example Figure 2 of [13]). 

Claim 3.8. Let s be a triangle of Gi{V) and let Zg be its conflict set. Then the Voronoi diagram 
of V U I restricted to s, {V{V U /)) fl s, is the same as the Voronoi diagram of Zg restricted to s, 

ViZg) n s. 

Proof. Recall that the Voronoi diagram of a set of points P C can be considered as the xy- 
projection of the upper envelope of a set Hp of hyperplanes in M^, where Hp contains a hyperplane 
hp for each point p E P. The hyperplane hp is obtained by lifting the point p = {x, y) to the 
point p := (x, + y^) on the unit paraboloid and taking the hyperplane hp tangent to the unit 
paraboloid in p. The hyperplane hp is called the lifted hyperplane for p. Each Voronoi node x of 
V(P) corresponds to the intersection x of three hyperplanes in Hp. Furthermore, a point g G 
is in conflict with Voronoi node x if and only the lifted hyperplane hg for q is intersected by the 
infinite upward ray extending from x. 

Now consider ViV) as an upper envelope of the set of lifted hyperplanes Hy. Let v E V he 
the point whose Voronoi region contains s. Then s is the xy-projection of a triangle s contained 
in the lifted hyperplane h^ for v. Let Q be the unbounded polytope obtained by extending s in 
positive z-direction. Then V{V U /) fl s corresponds to the intersection of the upper envelope of Hj 
with Q. But each hyperplane in Hj that intersects Q also intersects at least one infinite upward 
ray extending from a vertex of and hence all the relevant hyperplanes correspond to points in 
Zs. □ 



Claim [3751 implies that V(y U /) can be computed as follows: For each triangle s of GjiV), 
compute V{Zs) fl s, the Voronoi diagram of Zg restricted to s. Then, by traversing the edges of 

®We need to be a bit careful when handling unbounded Voronoi regions: we pretend that there is a Voronoi node 
Poo at infinity which is the endpoint of all unbounded Voronoi edges, and when we triangulate the unbounded region, 
we also add edges to ■ By our bounding triangle assumption, there is no point in / outside the convex hull of V 
and hence the conflict set of po^ is empty. 

^This also holds for unbounded regions, since the only unbounded regions on the upper envelope of Hyui corre- 
spond to the vertices of the bounding triangle. 
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Gi{V) and fusing the bisectors of the restricted diagrams, put all the V{Zs) fl s together to obtain 

v{vui). 

Lemma 3.9. The Voronoi diagram V{V U /) can be computed in 0{n) time. 

Proof. The time to compute V{Zs) fl s for a triangle s G GjiV) is 0(|Zs| log \Zs\) = 0{\Zs\^). For 
a region r of V(/), let S{r) denote the set of triangles of Gi{V) contained in r, and let E{r) denote 
the set of edges in V(^) incident to r and let N{r) denote the set of nodes in V{V) incident to 
r. Recall that tr denotes the common vertex of all triangles in S{r). Then the running time is 
proportional to 



E 




= E 


E Ei^.i^ 


< E 


E 




s€Gi{V) 




r€V{V) s£S{r) 




r€V{V) 
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rGV(y) eGE(r) 
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since Xt^ < min(Xt,, X^J. For e = (ti,t2), let = 1 + 2Xt^ + Xt^. Note that E[Ye] = 0(1). We 
can express Ye = + 2x(^i; + xih, i)), and {1/n + 2x(ti, i) + xih,i)) < 4. By Claim [231 

E[Yg^] = 0(1). The number of edges in a Voronoi diagram is linear and each edge appears in two 
Voronoi regions. Therefore, the sum is in 0(n). Furthermore, assembling the restricted diagrams 



takes time O f E 



E 



seGiiV) 



and as \ZJ < iZoP, this is also linear. 



□ 



3.2 Running time analysis 

In this section, we prove the running time bound in Lemma [3^ is indeed optimal. Before we go on, 
it is important to clarify the model of computation. We are using comparison based algorithms, 
where a single step (or "comparison") involves evaluating a point {zi, Z2, . ■ . , za) G M'^ (for constant 
d) at some polynomial f{zi, Z2, ■ ■ ■ , z^j) : M.'^ ^ and checking if the result is positive or negative. 
Based on this result, the algorithm chooses the next comparison to make. An algorithm can 
be completely represented by a decision tree, with each node representing some comparison. In 
this model, we get an information-theoretic lower bound of H{T{I)) for computing the Delaunay 
triangulation of input I Er V. 

Recall that by Lemma (3.41 the running time of the our algorithm is expected 0{n + ^iHY). 
The aim of this section is to prove the optimality of the algorithm by the following theorem. 

Theorem 3.10. For , the entropy of the triangle ti ofT{V) containing Xi, and H{T{I)), the 
entropy of the Delaunay triangulation of I, considered as a labelled graph, 

Y,Hr = 0{n + H{T{I))). 

i 

Proof. The theorem is proved by an application of Claim 12.71 with U = (M^)", X = T{I) and 
Y = (ti, . . . ,tn), the set of conflict triangles for /. In Lemma [3.111 we will show that the function 
/ : (/, T{I)) {ti, . . . ,tn) can be computed in linear time. The theorem now follows by Claims [231 
and [221 □ 
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1. Let Q be a queue containing the elements in V. 

2. While Q^dS. 

(a) Let p be the next point in Q. 

(b) If p = Xj G /, then insert p into T{V) using the conflict triangle Ui for Xj. 

(c) Using Claim [3?T2l for each unvisited neighbor Xj G Tyuiip) H /, compute a conflict 
triangle Uj in T(\^ U {p}). 

(d) For each unvisited neighbor Xj G ryu/(p) H J, using Wj, compute a conflict triangle 

in T{V). Then insert Xj into Q, and mark it as visited. 



Figure 3: Determining the conflict triangles. 

We first define some notation - for a point set P ^ VUl and p ^ P, let Tp{p) denote the neighbors 
of p in T{P). It remains to prove the following lemma: 

Lemma 3.11. Given I and T{I), for each Xi in I we can compute the triangle ti in T{V) that 
contains Xi in in 0{n) total expected time. 

Proof. First, we compute T{V U /) from T{V) and T{I) in linear time, using an algorithm by 
Chazelle [10]. We now show how to compute the confiict triangles in a special case. 

Claim 3.12. Let J I and assume that T{V U /) and T{V U J) are known. Furthermore, let 
p & V U J. Then, in total time 0{\Tvui{p) \ + |ryuj(p)|); for every Xi G Tyuiip) \ U J), we can 
compute a confiict triangle Ui in T{V U J). 

Proof. Let Xi G Tyuiip) \ U J)^ and let Ui be the triangle of T(y U J) incident to p that is 
intersected by line segment pxi. We claim that Ui is a confiict triangle for Xj. Indeed, since pxi is an 
edge of T{VUl), by the characterization of Delaunay edges (eg, [17, Theorem 9.6(ii)]), there exists 
an circle C through p and Xi which does not contain any other points from V U L In particular, C 
does not contain any other points from V^U JU{xj}, and hence pxi is also an edge of r(V^U JU{xj}), 
again by the characterization of Delaunay edges applied in the other direction. Hence, triangle Ui 
is destroyed when Xj is inserted into T{V U J), and thus Xj is in confiict with ttj. 

It follows that the confiict triangles for Tyuiip) \ (Y U J) can be computed by merging the 
cyclically ordered lists Tyuf^p) and Tyuj^p). This can be done in the claimed time. □ 

The confiict triangles for / can now be computed using breadth- first search (see Figure [3]). The 
loop in Step [2b] maintains the invariant that for each point Xj G QflJ, a confiict triangle Ui in T{V) 
is known. Next, we claim that Step [2d] can be done in constant time: 

Claim 3.13. Let Xi,Xj G /, and let Ui he a conflict triangle for Xj in T(y U {xj}) incident to Xj. 
Then we can find a conflict triangle Ui for Xj in T(y) in constant time. 

Proof. Let e be the edge of Ui not incident to Xj, and let f,w be the endpoints of e (refer to 
Figure [1]). Since f,w G V^, by the characterization of Delaunay edges, it follows that e is also an 
edge of T{y). Furthermore, since Ui is in confiict with Xj, we know that vxl and wxl are edges of 
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Figure 4: Proof of Claim 13.131 



T{V U {xi, Xj}), and hence also edges of T{V U {xj}). But this means that Xi is in conflict with at 
least one of the two triangles in T{V) that are incident to e. Given e, such a triangle can clearly 
be found in constant time. □ 



The while-loop in Step [2] is executed at most once for each p eVUI. It is also executed at least 
once for each point, since T{V U /) is connected and in Step [2d] we perform a BFS. The insertion 
in Step [2b] takes 0{\Tyu^x.j{xi)\) time. Furthermore, by Claim [3TT21 the conflict triangles of p's 
neighbors in T(y U /) can be computed in 0(|ryu{p}(p) I + |ryu/(p)|) time. Finally, as we argued 
above. Step [2d] can be carried out in 0{\Tyuf{p)\) time. Now note that for Xi G /, |rvu{a;,}(a^i)l is 
proportional to l^il, the number of triangles in T{V) in conflict with Xi. Hence, the total expected 
running time is proportional to 



E 



J2 {\Tvu{p}{p)\ + \rvuiip)\) 
.pevui 



E 



J2\^viv)\ + J2\s.\+ Yl irvu/(p)l 

vev 1=1 pevui 



0{n). 



Finally, using BFS as in the proof of Lemma [3.71 given the conflict triangles Wj, the triangles tj can 
be found in 0(n) time, and the result follows. □ 



3.3 The time-space tradeoff 

We show how to remove the assumption that we have prior knowledge of the Pj's (to build the 
search trees Pj) and prove the time-space tradeoff given in Theorem 13.11 These techniques are 
identical to those used in Section 12.21 For the sake of clarity, we give a detailed explanation for 
this setting. Let e > be any constant. The first O(logn) rounds of the learning phase are used 
as before to construct the Delaunay triangulation T{V). We first build a standard search structure 
D over the triangles of T{V). Given a point x, we can find the triangle of T{V) that contains x in 
O(logn) time. 

The learning phase goes on for O(n^logn) rounds. The main trick is to observe that (up to 
constant factors), the only probabilities that are relevant are those that are > n~^. In each round, 
for each Xi, we record the triangle of T{V) that Xi falls into. At the end of 0{n^ logn) rounds, we 
take the set Ri of triangles such that for t G Ri, Xi was in t for at least VLilogn) rounds. We remind 
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the reader that p{t,i) is the probabihty that Xi lies in triangle t. For every triangle in Ri, we have 
an estimate of the probability p{t, i) (obtained by simply taking the total number of times that Xi 
lay in t, divided by the total number of rounds). By a standard Chernoff bound argument, for all 
t G -Rj, pit^i) = Q{p{t,i)). Furthermore, for any triangle t, if p{t,i) = f2(n~'^), then t G Ri. 

For each Xi, we build the approximate search structure Di. Consider the following prob- 
ability distribution pi over the triangles of T(y): if t G Ri, set p(t,i) := p{t,i)/Ni, where 
Ni := J^teRiP^^^"^)' otherwise p(t,i) := 0. Using the construction of [6], we can build the 
optimal planar point location structure Di according to the distribution pi. The limiting phase 
uses these structures to find ti for every Xj: given Xi, we use Di to search for it. If the search does 
not terminate in logn steps or Di fails to find ti (since ti ^ Ri), then we use the standard search 
structure, D, to find tj. Therefore, we are guaranteed to find ti in O(logn) time. Without loss 
of generality, we can assume that each Di deals with only triangles (and therefore, a planar 
subdivision of size n"^). By the bounds given in [6], each Di can be constructed with size in 
log n time. The total space is bounded by n^"*"^ and the time required to build them is at most 

Now we just repeat the argument given in Section 12.21 Instead of doing it through words, we 
write down the expressions (for some variety). Let s{t,i) denote the time to search for Xi given 
that Xi G t. By the properties of Di, and noting that A^j < 1, 



J2pit,^)s{t,^) = ^p(t,^)log(l/p(t,0) 



= N-'J2Pit,t)^ogiN,/pit,t)) 



teRi 



< -Nr'J2p(t,t)logpit,i) 



teRi 




We now bound the expected search time for Xi. 



^p{t,i)s{t,i) 



P{t, i)s{t, i) + Y p{t, i)s{t, i) 



teRi t^Ri 
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Noting that for t ^ Ri, p{t,i) = 0{n ^) and therefore logp(t,z) < —e\ogn + 0(1), and so 



The total expected search time is 0{e~^{n + J^iJ^i^))- the analysis of Section [3A] and Theo- 
rem [SHOl we have that the expected running time in the limiting phase is 0{e~^{n + H{T{I)))). 
This completes the proof of Theorem 13.11 
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